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With its systematic exploration of probability distributions, Hamiltonian Monte Carlo is a potent 
Markov Chain Monte Carlo technique; it is an approach, however, ultimately contingent on the 
choice of a suitable Hamiltonian function. By examining both the symplectic geometry underlying 
Hamiltonian dynamics and the requirements of Markov Chain Monte Carlo, we construct the gen- 
eral form of admissible Hamiltonians and propose a particular choice with potential application in 
Bayesian inference. 



PACS numbers: 02.70.Tt,02.40.Hw 

Since its introduction by Duane, et al. [l| and devel- 
opment by Neal 0], Hamiltonian Monte Carlo (HMC) 
has proven to be a powerful Markov Chain Monte Carlo 
methodology. By utilizing Hamiltonian dynamics as a 
Markov transition kernel, HMC coherently explores the 
space of a target distribution and results in rapidly mix- 
ing Markov chains. Such dynamics, however, are depen- 
dent on a particular Hamiltonian function and existing 
applications have been limited mostly to forms common 
in the study of physical systems. 

The feature of Hamiltonian dynamics that furnishes 
such efficient Markov chains is a consequence of not any 
particular Hamiltonian function, but rather an implicit 
symplectic geometry on the underlying parameter space. 
By first understanding the properties of this geometry, its 
relation to probability measures, and then the constraints 
of a well-posed transition kernel, we determine the most 
general form of HMC and discuss immediate extensions 
to current work. 



SYMPLECTIC GEOMETRY 

Symplectic geometry is the study of an intrinsic struc- 
ture of certain differentiable manifolds having applica- 
tions spanning the physical sciences Q. After introduc- 
ing the tangent and cotangent bundles that derive from 
a given base manifold, we show how the latter is natu- 
rally equipped with a symplectic geometry, and how that 
structure admits a fiow along the cotangent bundle. 



given point P E Ai 'looks like' R". Formally, a home- 
oniorphism ^pa '■ Oa Ua must exist from any open 
subset Oa C to an open subset Ua C M" such that 
the map ipp o ip~^ is continuously differentiable for any 
overlapping subsets Oa n O/j ^ 0. 

Of the many possible differentiable mappings between 
differentiable manifolds, two of the most important are 
curves, which map open sets of M to neighboring points 
on the manifold, and functions, which map each point P 
to M. The collection of all smooth curves and all smooth 
functions on M form differentiable manifolds in their own 
right, and these two manifolds naturally equip each point 
P E M. with two local vector spaces. Tangent vectors 
of all curves passing through P form an n-dimensional 
vector space known as the tangent space, Tp; the gradi- 
ents of all manifold functions in a neighborhood of point 
P span another ??-dimensional vector space denoted the 
cotangent space, Tp. These two vector spaces are dual 
in the sense that an element of Tp serves as a linear 
transformation mapping an element of Tp to M, and vice 
versa. 

A set of n functions, {'?'(f')}El ^^rs coordinates on the 
manifold if they uniquely identify each point, and any 
choice of coordinate functions induces natural bases for 
the tangent and cotangent spaces across M.: the direc- 
tional derivative vectors, = and gradients, dg', 
respectively. These coordinate bases provide an expan- 
sion of any element v G Tp or p € Tp, 



E 



V e,; 



The Tangent and Cotangent Bundles 



P 



^^Ptdq', 



A smooth, n-dimensional manifold Ai is defined as a 
topological space where the local neighborhood of any 
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where the components (resp. pi) are manifold func- 
tions depending on v (resp. p) and the q\ Note that the 



^ Note that the index here, i G {1, . . . , n}, is simply a label and 
need not, for example, be considered the components of a vector. 
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components uniquely identify each element of the vector 
spaces: when Tp and Tp are treated as manifolds, the 
component functions {ii*} (resp. {pi\) serve as proper 
coordinate functions on Tp (resp. Tp). 

Collecting all of the tangent spaces across the man- 
ifold yields another 2n-dimensional manifold called the 
tangent bundle, TA4. A point in TA4 corresponds to a 
specific vector in a particular vector space Tp and lo- 
cally the bundle factors into a trivial product space of 
the base manifold Ai and the tangent space Tp. Co- 
ordinates on TA4 then decompose into the direct sum 
of coordinates on the two manifolds, {q^,v^}. Likewise 
the cotangent spaces can be collected into the cotangent 
bundle T*M where each point is identified by the coordi- 
nates {q^,Pj}. In analogy with physical systems, the two 
bundles TAi and T*A4 arc also known as configuration 
space and phase space, respectively (Tabic 



The Symplectic Form 

A change of coordinates on the base manifold A4, 

induces a unique linear transformation on the basis one- 
forms dq' on A4, 

and on the coordinate decomposition of a one-form p, 

Note that the Jacobian and inverse Jacobian of the co- 
ordinate transformation, dq^/dQ^ and dQ'^/dq^ respec- 
tively, are matrix inverses, 

^ dQ^ dq^ _ „ 

dq3 ag'= • 

Coordinate transformations on M. also induce a fam- 
ily of transformations of the fields on T*J\A] these point 
transformations do not mix the and {pi\ coordi- 

nates and preserve the trivial fiber structure of the man- 
ifold. Specifically, the one-forms dg* over T*Ai (which 
are just the vertical lifts of the one-form fields dq* over 
M) transform as in Eq. ([1]), while the coordinate func- 
tions Pi transform as in Eq. ([2]). All of the coordinate 
functions, and their associated basis one- forms 

{dg*,dpj} are clearly dependent on the coordinates of 
the base manifold. 

Because the two transformations in Eq. ([T]) and Eq. ^ 
are inverses, however, there arc some natural objects on 



T*M^ independent of the manifold coordinates. In par- 
ticular, the one-form 6'0 

n 

e = ^ -p,dg' . 

is coordinate independent by construction. The exterior 
derivative dO, 

uj = Ae = ^dq' A dp, , (3) 

i 

is more subtle: although the transformation of dpi in- 
volves derivatives of the Jacobian, w is coordinate inde- 
pendent. 

uj' = Y^ dQ^ A dPj = dq' Adpi=uj. 

j i 

This can be seen either by explicit calculation Q , or more 
simply by noting that, because 9 is coordinate indepen- 
dent, the exterior derivative dO must be as well. 

This symplectic form, w, is a two- form, or anti- 
symmetric tensor of rank (0, 2), which maps a given vec- 
tor field X to a one-form X, 

X = uo{X,-). 

Because w is non-degenerate, this linear transformation 
is invertible, and any one-form Y over phase space also 
defines a unique vector field Y such that ojiY , ■) = Y . 
Starting from any function (q, p) on T*AA one can con- 
struct a one-form via the exterior derivative, dH , and, 
from this, define the Hamiltonian vector field Xh asso- 
ciated with H such that ui{Xh, •) — dH. 

Hamiltonian Flow 

By specifying a direction at each point P E T*Ai, 
the vector field Xh defines curves through the manifold. 
These curves cover the entire manifold without intersect- 
ing, mapping every point P to another along the local 
curve and generating a flow of the entire cotangent bun- 
dle. 

In particular, the vector field differentiates any func- 
tion / along the defined vector at each point, Xnif) = 

(^^H^ where = {q\pj^ arc the coordinates of 

T*Ai . If the curves are parametrized by t S K, then the 
action of the vector field is simply Xnif) = ^(/). Ap- 
plying the vector field Xh to the coordinates functions 



^ The resemblance of 8 and an expansion of a one-form p on M is 
a result of ambiguous notation. Here pi are coordinate functions 
on T*M, and dq' are one-forms on T*M. This one- form 6 (or 
its negation —9) is sometimes called the tautological form. 
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Manifold 


Manifold 


Coordinate 


Coordinate 




Name 


Functions 


Names 


M 


Position Space 




Position 


TM 


Configuration Space 




Position, Velocity 


T'M 


Phase Space 




Position, Momenta 



TABLE I. The tangent and cotangent bundles, and their coordinates, are often referenced in analogy to physical systems in 
classical mechanics. 



themselves yields ordinary differential equations of each 
coordinate along the integral curves, 

known as Hamilton's equations in classical mechanics 00 
Similarly, the rate of change of an arbitrary scalar func- 
tion, /, along the integral curves of Xh is given by 

= dfiXn) 

= u:{Xf,XH) 

dJ_dH_ df dH 
dq^ dpi dpi dq^ 

where {/, H} is known as the Poisson bracket. Note that 
the function / is invariant along integral curves if the 
Poisson bracket vanishes; in particular the initial function 
H, or Hamiltonian, is always preserved, 

dH ^ , , , 

— =Xh{H) = {H,H} = Q. 

The rate of change of a general geometric object along 
the integral curves of Xh is given by using the Lie deriva- 
tive, C By definition the symplectic form is closed, 

dw = 0, and as a result the Lie derivative along Xh of lo 
vanishes, 



Consequently, the Lie derivative of the top-rank differen- 
tial volume form VL 



E 



0. 



^ The resulting Hamiltonian flow is often denoted Hamiltonian 
evolution, or Hamiltonian dynamics. 

* Local Hamiltonian flow is not always adequate: constraints 
of the coordinate functions, for example, require discontinuous 
jumps in momentum otherwise known as specular reflection (Ap- 
pendix |XJ. 

^ The scalar functions considered above are geometric objects in 
their own right and, indeed, the Lie derivative of a scalar field 
agrees with the action of the vector field. 



also vanishes. 



implying that differential volume elements of phase space 
are preserved under the evolution along the integral 
curves. 

Note that, with both the Hamiltonian and the differen- 
tial phase space volume preserved, the phase space den- 
sity F{H)il for any smooth scalar function : IR — ;> M is 
also invariant along integral curves H 

Cj^^F{H)n = {Cx^F{H)) n + F{H)£j.^n 

^F'{H) {c^^H)n 

= 0. 

This property is inherent to the symplectic geometry of 
the cotangent bundle and holds for any choice of Hamil- 
tonian and scalar function F. 



THE IDENTIFICATION OF FORMS AND 
MEASURES 

A top rank form on a n-dimensional manifold is a ten- 
sor, independent of the manifold coordinates. Given a 
particular choice of coordinates {a;*} defined across the 
entire manifold, however, any such top-rank form can be 
decomposed as 

(/. = /x(x) d"x, 

where /x(x) is a scalar function determined by the co- 
ordinates and d^x is the volume form of the coordinates 
constructed by wedging the gradients of each coordinate 
function together, 

d"x = dxi A---Ada;". 



This geometric statement is equivalent to Liouville's theorem in 
statistical mechanics [J. 
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Introducing the object / which evaluates to the appropri- 
ate /x(x) for any given set of coordinate functions {a;*}, 
the decomposition becomes 

c/. = /(x)d"x. 

By construction, the objects /(x) and d"x in the de- 
composition depend on the coordinates: under a change 
of coordinates from {x*} — ?► {-^'} the two terms trans- 
form by acquiring a determinant of the Jacobian matrix 
(9x/5X, 



/(X) 
d"X 



9x 



dX 
9x 



+1 



dX 



/(x) 



d"x. 



These objects are not tensors but rather tensor densities. 
In general a tensor density of weight w transforms a^ 



P(X) 



9x 
dX 



p(x) 



with /(x) and d"x immediately identified as tensor densi- 
ties of weight -t-1 and —1, respectively. Note that, when 
the two objects arc combined, the additional Jacobian 
factors cancel to give a coordinate independent object as 
expected of the original tensor. 

The topological space that forms the manifold At can 
also be endowed with a coordinate independent mea- 
sure that, given coordinates, can be decomposed into two 
coordinate-dependent objects Any Borel measure /x 
can be written as 

dn = fix) d"x, 

where d"x is now the Lebcsguc measure on the chosen 
coordinates and /(x) is the Radon-Nikodym derivative of 
H with respect to d"x. Under a coordinate transforma- 
tion, these two objects acquire a factor of the Jacobian 
just as above, 



/(X): 

d"X 



dX 



+1 



ax 



Indeed the similarity of the two systems is no coincidence: 
provided that the manifold can be oriented, the Riesz 
representation theorem guarantees that the space of top- 
rank forms is locally isomorphic to the space of measures 
0, providing an identification between the form cf) and 
measure fi as well as the terms in their above decomposi- 
tions. If the form is everywhere positive and the integral 



over all of A4 is finite then the corresponding measure 
becomes a normalizable Borel measure from which one 
can build a probabilistic space. 

Note that objects on the cotangent bundle can be 
densities with respect to coordinate transformations on 
T*M, acquiring factors of \d (q, p) /<9 (Q, P)|, or with re- 
spect to transformations on Ai, picking up only factors 
of \dq/dQ\. The point transformations introduced above 
are a special case of symplectomorphisms, which preserve 
the symplectic form u and have Jacobian determinant 
|9 (q, p) /9 (Q, P)| = +1. Therefore, any object trans- 
forming as a density with respect to T*Ai, such as those 
in the above decompositon, are actually invariant under 
transformations that preserve the structure of the cotan- 
gent bundle. 



INTEGRAL CURVES AS MARKOV 
TRANSITION KERNELS 

Because the volume form is nowhere zero, a symplec- 
tic manifold can always be oriented and, consequently, 
the form F (H) can always be identified with a mea- 
sure. Moreover, if F is restricted to positive, integrable 
functions, 



F 



s.t. 



/ F{H)ne 
Jm 



then the corresponding measure will be a Borel measure. 

Taking F (H) ~ exp a given Hamiltonian 

uniquely defines a probability measure 

djU = 7r(q, p) dqdp 

= exp[C-H{q,p)]Q, 

where C £ R is determined by the normalization. Hamil- 
tonian flow maps ^ into itself and thus defines a proper 
Markov transition kernel for the distribution 7r(q, p). 
Note that, in practical applications the dynamics must 
be performed numerically and the measure will not be 
conserved exactly 0. Any subsequent bias, however, can 
be avoided by treating the flow as a Metropolis proposal 
function instead of the transition itself [l|, |3l ■ 
Now if the gradients of the Hamiltonian satisfy 

9'(q. -p) = -gXq,p) 

p, (q, -p) = +Pi(q,p) 

then the resulting flow is reversible: reflecting the mo- 
menta and evolving the same distance along the integral 
curves recovers the initial configuration. By adding such 
a reflection to the end of each evolution, the resulting 



* Other choices for F are possible, but this exponential form has 
The definition of the sign of the weight can vary within the lit- a computational advantage when considering the ubiquitous dis- 

erature; here we use the convention of [Sj. tributions from the exponential family. 
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transition has detailed balance with respect to 7r(q, p), 
which then becomes the unique stationary distribution. 

The augmented transitions, however, do not produce a 
convergent Markov chain because the evolution explores 
only level sets of probability density, 

7r(q, p) = 7r(qo, po) (x exp (-i?(qo, Po)) • 

Movement across the probability contours can be intro- 
duced by adopting a Gibbs strategy and alternating each 
Hamiltonian transition with transitions from any condi- 
tional distribution spanning the contours. The combined 
transitions guarantee ergodicity and, provided that the 
transitions are tuned to avoid cycles in phase space and 
subsequent periodicity, the resulting Markov chain will 
converge to 7r(q, p). 

While any reversible Hamiltonian defines both a prob- 
ability distribution and respective Markov chain, practi- 
cal applications demand the reverse construction: what 
Hamiltonians, and hence well-behaved Markov chains, 
are consistent with a given target distribution 7r(x)? 

Provided a decomposition of the random variables into 
positions and momenta, x = {q, p}, the support of 
the target distribution could be identified with the full 
cotangent bundle and 7r(x = {q, p}) would fully define a 
Hamiltonian. There is no natural motivation for such a 
decomposition, however, and one resulting in a reversible 
Hamiltonian system need not even exist 

If we appeal to the factorization of the cotangent bun- 
dle into {g'} and {pi} coordinates and instead identify 
the random variables with the position manifold M , how- 
ever, then X = q and the target distribution constrains 
only the marginal distribution of 7r(q, p). 



7r(x) = y d"p7r(q,p) 



Any joint distribution defined over the entire cotangent 
bundle factors, 

7r(q,p) = 7r(p|q)7r(q) , 

and the residual freedom in the choice of conditional dis- 
tribution, 7r(p|q), can be engineered to not only guaran- 
tee reversibility but also be easily sampled to ensure that 
the entire procedure remains computationally efficient. 
Once the latent variables p arc marginalized out, the 
Markov chain generates the desired samples from 7r(q). 

This final identification completes the construction of 
a general MCMC procedure: a given target distribu- 
tion 7r(q) and the selection of an appropriate conditional 
distribution, 7r(p|q), defines Hamiltonian dynamics and 
a well-behaved transition kernel. Because the dynam- 
ics incorporate gradients of the target distribution, the 



A trivial example being a distribution defined on an odd- 
dimensional space. 



transitions systematically explore the target distribution 
and avoid the random walk behavior typical of other 
MCMC techniques 0. Moreover, the freedom in the 
conditional distribution offers the potential for includ- 
ing more information about the target distribution and, 
consequently, further improving the performance of the 
resulting Markov chain. 



ADMISSIBLE HAMILTONIANS 

With the above considerations, the most general 
Hamiltonian yielding the desired Markov chain is 



H = -log7r(q,p) + C 
= -log7r(p|q)7r(q) + C 
= -log7r(p|q) - log7r(q) 
= r(q,p) + F(q) + C. 



c 



Note that T and V are neither scalars nor scalar den- 
sities. While the joint distribution 7r(q, p) is invariant 
under point transformations (and indeed all symplecto- 
morphisms), the distributions in the decomposition are 
densities of opposite weight with respect to coordinate 
transformations on the position manifold M., 



7r(P|Q) = 

^(Q) = 



dq 



dQ 
dq 



7r(p|q) 



dQ 



+1 



7r(q) 



and the terms in the Hamiltonian must transform as 

(9q 



r(Q,p) = r(q,p) + iog 



dQ 



ViQ) = F(q) - log 



9q 



dQ 



(4) 
(5) 



When added together the additional factors cancel to 
yield the scalar Hamiltonian. 

While the potential energy ^(q) is fully specified by 
the target distribution, the kinetic energy is constrained 
by only the defining normalization of a conditional dis- 
tribution 

y d"p exp(-r(q,p)) eM+, 

i.e. a finite positive number independent of q, and the 
demands of detailed balance, 

T(q,-p) =T(q,p) 
dT dT 

t;— (q, -p) = -T^(q,p)- 

dpi dpt 

Reversibility is assured for any kinetic energy of the 
form 

r(q, p) = Ti (q, p) + Ti (q, -p) + Ta (q) , 
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where the first two terms can, in general, be decomposed 
into a sum of completely symmetric tensors r(„) of type 
(71, 0) with n even and positive, W\ 



Ti(q,p) + ri(q, -p) 



n=2,4,...ii...j„ 



■•p.„?(ii)-""(q). 



A particularly useful choice is any scalar function of a 
non-dcgcncrate quadratic form in the momenta, 

T(q,p)=r| ^p,p,A^^(q) I + tM) ■ 



Here the tensor A(q), or more appropriately its inverse, 
effectively serves as a spatially-dependent linear trans- 
formation of the coordinates q — if the transformation 
simplifies the structure of the target distribution then 
the resulting Markov chain promises to explore the space 
much more efficiently. In particular, the incorporation of 
the derivatives of the potential can help to locally stan- 
dardize the distribution, avoiding complications due to 
the narrow valleys characteristic of strong correlations. 
Taking t to be the identity gives 



T(q,p) 



2 ^ 

1-3 



1 



KPjA*^(q) - - log I A (q 



(6) 



which results in a gaussian conditional distribution, 

7r(p|q)=Ar(p|0,A). 

Notice how the normalization of the conditional gaussian 
introduces the determinant of A, which transforms as a 
tensor density of weight —2 with respect to coordinate 
transformations on A^[i3 Under a point transformation 
the normalization becomes 



2log|A(Q)| 



iiog|aq/aQr'|A(q) 

1 



= ilog|A(q)| + ^log|aq/9Qr' 

= ilog|A(q)|-log|aq/9Q|, 

introducing exactly the necessary factor of the Jacobian 
to ensure the proper transformation of T . The identifica- 
tion of measures with forms ensures consistency between 
the two perspectives. 

If the covariance S = A"-"^ is proportional to the iden- 
tity then the Hamiltonian reduces to that of classical me- 
chanics and the form used in the first implementations 



In order to satisfy the defining transformation property llJ}, the 
contribution from t\ (q, p) -|- ri (q, — p) must be a scalar and, 
consequently, the Tf^) must be tensors. The second term T2(q) 
is ultimately responsible for the introduction of the necessary 
factor of log |9q/i9Q|. 

Provided that k~\-l is even, the determinant of a rank (fc, /) tensor 
exists and transforms as a scalar density of weight I — k. 



of HMC [l|. A nontrivial but constant covariance ma- 
trix allows for a global rescaling and rotation of the tar- 
get distribution , and a spatially dependent covariance 
transforms the target distribution locally . 

Because integrability requires it to be positive-definite 
and non-degenerate, the spatially-varying covariance can 
also be interpreted as a Riemannian metric and, from 
this perspective, the Hamiltonian evolution locally par- 
allels geodesies of a curved manifold @. This additional 
geometric structure embedded within the symplectic ge- 
ometry creates the potential to further improve perfor- 
mance with the application of more tools from differential 
geometry. Utilizing any such possibility, however, first 
requires a choice of the spatially- varying covariance. 



Choices of the Covariance Matrix 

The Fisher-Rao metric ubiquitous in information ge- 
ometry. 



91og7r(x|y) (91og7r(x|y) 



or given particular coordinate functions 
"W(x|y) 9U(x|y) 



dx'- 



provides an obvious candidate for the covariance matrix, 
and its use in HMC has proven successful when the ex- 
pectation can be performed analytically While the 
Fisher- Rao metric does incorporate derivatives of the tar- 
get distribution, however, it requires integrating over the 
y to ensure positive-definiteness. In a Bayesian appli- 
cation this necessitates expectation of the posterior over 
the ensemble of possible data sets; not only is the expec- 
tation contrary to the Bayesian philosophy of inference 
based solely on the measured data, it also washes out 
the local structure particular to a given data set that 
can prove important in posterior exploration. 

The geometric perspective, however, suggests another 
possibility free from the unwanted marginalization. A 
"background" Riemannian metric cr defined on J\A in- 
duces a metric on the graph of the potential V , EEI 



,(q) 



c)U(q) dV{(\) 



Unfortunately, S does not transform as a proper tensor 
because V is not a proper scalar function. The additional 
structure afforded by cr, however, admits the necessary 
correction: because the determinant \cr\ is a scalar den- 
sity of weight +2 with respect to transformations on A4, 



The graph is an n-dimensional submanifold in the (n + 1)- 
dimensional manifold with coordinates (q, V(q)). 
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the quantity 



CONCLUSIONS AND FUTURE WORK 



T/(q) = y(q) + -log|<T(q)| 



is a true scalar and the modified metric 



Sy(q) = CT,j(q) 



dVjq) dVjci) 



a true tensor. 

Note the similarity of the outer product {diV){djV) 
to the argument of the expectation value in the Fisher- 
Rao metric: this induced metric features a similar struc- 
ture without the undesired expectation. Moreover, S is 
simply a rank-1 update of the background geometry and 
the metric can be efhcicntly inverted with the Sherman- 
Morrison- Woodbury formula Q , 



{d'V){d^V) 



where A is the inverse of a (satisfying y'^Ckj = and 

If cr is homogeneous (i.e. independent of position) then 
the inversions necessary at each iteration of the Hamilto- 



nian evolution can be computed at order O 



signifi- 



cantly faster than the O (??. ) required for the inversion of 
the dense Fisher-Rao metric. The Christoffel coefficients 
of the metric S, which specify the Hamiltonian flow, are 
straightforward to calculate in this case, El 



By considering the geometry of Hamiltonian dynamics 
and the basic constraints of Markov chain Monte Carlo, 
we have constructed the most general approach to Hamil- 
tonian Monte Carlo (HMC). 

The simplest admissible Hamiltonian given in Eq. ^ 
reproduces existing approaches to Hamiltonian Monte 
Carlo, but knowing the most general form begs further 
extensions. Girolami, et al., for example, have considered 
the kinetic energy 



T(q,p) = 



log 1 



E"=iP.P.A'^(q)\ 1 



--log|A(q) 



which gives Student's t-distribution in place of the gaus- 
sian, but with only mixed results. Other choices of the ki- 
netic energy may dramatically improve HMC for certain 
distributions, and there may still be means to improve 
the performance universally. 

Generalizing the symplectic manifold of HMC to a 
Poisson manifold [10] offers even more possibilities. 
Without the restriction of non-degenerate forms, the 
Poisson manifold can accommodate distributions with 
spatially varying dimensionality and perhaps even admit 
trans-dimensional Monte Carlo. 

Given the unexplored possibilities of this geometric 
perspective, the future of HMC is promising. 
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id' v){d,dkV) 
i + {d'v){diV) 



also requiring only 0{'n?) operations and allowing the en- 
tire Hamiltonian flow to be calculated at the same order. 
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position independent, 8iV = diV and the metrics coincide, S = 
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APPENDIX A: SPECULAR REFLECTION FOR 
GENERAL HAMILTONIANS 

The explicit incorporation of inequality constraints is 
often advantageous, and sometimes even necessary, in 
the construction of certain Markov chains. Arbitrary in- 
equality constraints, 

C(q) > 0, 

can be incorporated into the Hamiltonian framework 
with the introduction of an infinite potential [TTj , 



oo, C(q) < 
0, else 



A naive implementation of Hamiltonian dynamics with 
such a potential, however, immediately fails because the 
gradient dy(q) along the boundary is undefined. 
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When considering the classical mechanics of a point 
particle, H = ^ '^■■PiPj6'^^+V{q), the difhculties around 
the potential barrier can be avoided by appealing to the 
exact result: the components of momentum perpendicu- 
lar to the constraint surface reflect while preserving the 
value of the Hamiltonian. This specular reflection is given 
by 



Ap 



-2 (p,n)n, 



(7) 



where the unit normal along the surface of equality 
(C(q) = 0) is 



dC 



n = 



and the inner product of one-forms is induced by the 
symmetric (2,0) tensor S, 

(a, b) = ^ GibjS'^ = ^ aA . 



The reflection depends only on the direction of dC and 
not the undefined dV. Also note that only the com- 
ponents of the momentum perpendicular to level sets of 
C(q) transform; those parallel to level sets arc unaffected. 

Generalizing specular reflection to a general Hamilto- 
nian system is straightforward. The gradient n = dC 
defines a unique one-form at the surface of constraint 
equality C(q) = 0, and forms a basis with the addition 
of (n — 1) one-forms w*, i G {1, . . . ,n — 1}, each linearly 
independent of one another and n. With this basis, the 
momentum p at a point of reflection q on the constraint 
boundary may be decomposed as 



p n 



Because the reflection is determined entirely by the sur- 
face C(q) = 0, the momentum can transform only along 
the one-dimensional subspace spanned by n. Conse- 
quently, the pll should be invariant and a general reflec- 
tion is given by 



Tl-l 



n-l 



p =p-^n+ ^pfw' ^ p' = ap^n + ^pju}\ 



along with the condition that the Hamiltonian remains 
invariant, i7(q, p') = i/(q, p). 

In the case of a quadratic kinetic energy, 
fl"(q,P) = i ^P.P,A^^ (q) - 1 log |A(q)| + T/(q) , 



conservation of the Hamiltonian requires 
= i/(q,p')-ff(q,p) 



{a-l)p^ 



i(a + l)p^5](dC),(dC)^.A'- 



71- 



+ 



k—l ij 



(dC), ioj%A^^ 



where (dC)j and (w'')^. are the components of the respec- 
tive one-forms in an arbitrary basis. 

Ignoring the a — 1 solution where the momentum is 
unchanged, 

or with a bit of manipulation 

, S.,-(dC),p,-A'^- 
P^E,,(dC),(dC)^.A'^' 

implying that the change in momentum is 

Ap = (a — l)p^n 

„ E^AdC%p,A^^ 



E,, (dC), (dc)^. A^^- ■ 

Defining an inner product induced by the tensor A, 

and a unit one-form under this inner product 

a 



a = 



the generalized specular reflection becomes 
Ap = 2 (n,p)^n. 



(8) 



The resemblance of Eq. ([5]) to Eq. ([7]) is welcome. As 
before, the dependence of the result on only the direction 
of n ensures that the infinite magnitude of the potential 
barrier provides no difficulties. Moreover, the general 
result not only reduces to the classical case for A*^ = 
i5*^ , it also agrees with the generalization that would be 
expected given the interpretation of the quadratic kinetic 
energy resulting from a Ricmannian (covariant) metric 
on the base manifold (where as before S = A"-'^). 

A straightforward calculation verifies that Eq. ([8]) is 
also a sufficient reflection solution for any kinetic energy 
of the form 



T(q,p)=rK]p,p,A^^(q) + r^iq) 
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